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Abstract 



K*" ■ We study the first-order phase transition in the model of a simple perceptron 



with continuous weights and large, but finite value of the inputs. Making the 
analogy with the usual finite-size physical systems, we calculate the shift and 
the rounding exponents near the transition point. In the case of a general 
perceptron with larger variety of outputs, the analysis gives only bounds for 
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Some time ago W.Nadler and W.Fink |Q showed, for the model of the percep- 
tron, that the transition from storable to unstorable pattern set sizes obeys finite-size 
scahng (FSS) behavior. This transition is characterized by the absence of an intrinsic 
length scale as is the correlation length for the usual phase transitions. 

Similar "geometrical" phase transitions without intrinsic length, are the satisfia- 
bility of random boolean expressions p[, 0, the connectivity of random graphs [H, 
the quasispecies model of molecular evolution f^, etc. All these models exhibit a 
sharp transitions for large values of the size of the corresponding system, character- 
ized by universal scaling functions, which describe the size-dependent effects near the 
threshold. Recently it was also shown that the statistical mechanics study of the K- 
SAT model is very useful for solving the hard computational NP-complete problem, 
as it represents, since the nature of the transitions which occur in it may help for the 
improvement of the efficiency of the search algorithms 0. 

In the present letter we study the FSS behavior of perceptrons in the context of 
the usual FSS study known from different physical systems 0-0. For this purpose 
we use some of the current definitions for shift and rounding of the transition, which 
occur when the size of the system is finite. 

The system, we are interested on, is a singe-layer perceptron storing a set of input 
patterns = l,...,N;fj, = l,...,p, drawn from a Gaussian distribution. By 

and p we denote the numbers of the inputs and the patterns, respectively. 

It is well known ||T0[-||12[ that for the system with one output unit, Gaussian inputs 



and continuous couplings, the fraction of all the possible input-output relations of 
size a = p/N that can be stored, called P{a,N), exhibits a smooth transition for 
finite value of A^, which becomes discontinuous (step-like) at «c = 2 and in the 
thermodynamic limit N ^ oo (see Fig. 5.11 in ref[]TT| or Fig. 3. 4 in ref.|]T2|]). The 
exact analytical expression for P{a,N) is p!0|| : 

P(a,Ar)=2^-%^=o'(^7^ ). (1) 

When the size of the system A^ is large, eq.(|l]) takes the asymptotic form: 

n«,Ar)^l||l + i?r/(y^(2-a))j, (2) 

revealing a FSS behavior with a scaling parameter 

y=ia- a,)N^I'' (3) 

and a scaling exponent v = 2 near the transition point Uc = 2. The plot of P{a, N) 
in terms of the scaling parameter y leads to the fall of all the curves with different A^ 
onto a single one f^. 

Because of the mean-field character of the model, the usual concept of length and 
dimensionality become ambiguous. To avoid the lack of natural geometric description 
in this case, one can choose the number of particles (or the number of inputs in our 



2 



case) as a finite-size parameter and the dimensionality of tlie system can be con- 
sidered as arbitrary. Note tliat tlie standard finite size scaling hypotheses, involving 



the notion of correlation length, need a suitable extension for such systems |]T3[-It is 
easy to make the analogy between the scaling parameter y for the perceptron and 
the corresponding scaling parameter for infinitely coordinated systems [0, defined 
by the coherence number, which replaces the usual correlation length. The relation 
between the two models leads to the same scaling form, eq.(^, expressed in terms of 
the corresponding critical parameter. 

Following the analogy with the conventional first-order transitions |0]-[H; we define 
as a transition point ac{N) this value of the parameter a, for which the derivative 
^^g"'^^ shows a maximum for large but finite values of (A^ being the size of 
;he system) [|14[. This derivative becomes divergent in the thermodynamic limit at 
ac = 2. 

Using the above scheme and eq.(|I|), we calculated]^ the inflection point of the 
function P(a, N) with respect to the parameter a for different values of N and we 
identified the so-called shift exponent 0-0: 

a,{N) - a,(oo) ~ ^. (4) 

We obtained the following dependence of the critical storage capacity ac{N): 

g0.0328-0.00006Ar 

a,{N) - a,(oo) = ^,.oo906 ' (5) 

for running between A^ = 8 to A^ = 400 and 

g0.00571 

a^{N) - ac(oo) = j^^;^^, (6) 

for A^ between A^ = 100 and A^ = 400, Fig(l). 

It becomes evident that for very large A^, the following scaling dependence takes 
place: 

am - «c(oo) ~ 1. (7) 

Using the definition of the shift exponent, eq.(|p, we obtain A = 1. 

The other possibility to calculate A is by performing analytical expansions for small 
values of the shift from the transition point etc = 2, using the asymptotic expression 
eq.(0). A straightforward calculations leads to the same result for the shift exponent, 
i.e., A = 1. Note that the value of the shift exponent is obtained by defining the shift 
of the transition point as the position of the maximum of 



dP{a,N) 



, as is usually done 

in FSS analysis of physical systems 0. This value of A does not coincide with the 
value 1/z/ = 1/2, as is usually expecting from the analogy with the FSS in physical 
systems. As we see with the present model, the coincidence of the values of the both 
exponents is not a necessary condition for the FSS to hold. 



^Here we would like to mention that the time to find a solution diverges with TV, reminding of 
the critical slowing down. 
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Figure 1: 
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The critical storage capacity ac{N) for values of N within the interval 



The finite shift of the critical point shows that the perceptron belongs to the class 
of systems undergoing asymmetric phase transitions. For symmetric phase transitions 
the shift is zero and a typical example for such transition is the field-driven transition 
in the Ising model, where the symmetry for the susceptibility x{h, T, L) = x{~h, T, L) 
takes place (here h is the external field and L is the finite size of the system). Ob- 
viously, a finite shift in our case means that the above symmetry is broken, which is 
usually the case of transitions driven by temperature 

The result (|^ is similar to the well known result for asymmetric temperature- 
driven first-order transitions in finite (i-dimensional systems with cubic symmetry, 
where the location of the shift by the maximal slope scales like L~'^ {L being the 
finite size) 0-0 • Although our system is effectively finite (with size N) in one 
dimension, we regard this analogy just as a formal one, because of the mean-field 
character of the interactions of the present model and the absence of any boundary 
conditions imposed. 

Apart of the definition of the transition point by the maximum of P{a, N), there is 
also another definition of the location of the transition for the first-order transitions, 
which assumes the equilibrium of various phases near the point of the transition 
T5| . In contrast to the usual result known for the g-state Potts model where 
the shift is given by L~'^, L being the size of the system and d its dimension, the 



definition used in ref.|jT5| leads to exponentially small corrections for the shift. In the 
case of a perceptron, however, we can not make the close analogy following the last 
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scheme, because of the lake of various phases at equihbrium, which is crucial for the 
application of this definition. 

The two classes of transitions, symmetric and asymmetric, also show a rounding 
behavior for N finite, which is given by the scaling of the width of the peak of the 
diverging observable. In other words, it is the interval over which the singularity 
is smeared out and which becomes increasingly sharp as the finite dimension of the 
system goes to infinity. In the concrete case of the simple perceptron this is the scaling 

which determines the rounding behavior. Using eq.(0), the 



of the width of 
derivative reads: 



dP{a,N)\ 
da 



dP{a,N) 



da 




[N{2 - a) 
2a 



from where the scaling of the variance of the Gaussian distribution gives a rounding 
exponent ^ = |. Note that a similar behavior with occurs for the shift and 
the variance of the generalization error in the case of a Bayesian perceptron with 
continuous weights 



An interesting problem is what happens in the case of a perceptron with binary 
weights |]T^. For this case the numerical analysis for the typical fraction shows a 
sharper behavior between the two regimes by increasing A^, but there is no defini- 
tive conclusion about the step-like behavior in the thermodynamic limit |TB|. An 
important investigation of the shift of the transition will be a similar calculation of 
the position of the maximum of the above derivative as we did for the continuous 
couplings model. This will probably lead to different results, since in the binary case 
the probabilities of separation as a function of a for different N do not intersect at 
the same point. 

In the general case of a perceptron, having a larger variety of outputs |jT9|, and 
P > dvc {dvc being the Vapnik-Chervonenkis (VC) dimension), the fraction of all 
the possible input-output relations obeys the following inequality []20|: 



[a 



N) < 2^ 



p — 1 



)• 



(9) 



It has been shown that in the thermodynamic limit 
and keeping a = , and aye ~ 
above and below a 



oo (p, dvc oo) 
^ fixed, the VC-entropy shows different behavior 
2avcy which permits to relate the storage capacity of the 
network to its VC- dimension via ac < 2avc (o^c = «c(oo)) (In the case of a single 
layer perceptron, treated at the beginning, dye = N , aye = 1 and etc = 2). 

Eq-d^) shows that for A^-large, the asymptotic form of the upper bound P{a,N) 
of the fraction P{a, N) is given by: 



P{a,N)<P{a,N) = - 



l + Erf\ ^l — {2avc-a] 



(10) 



leading to the same values for the shift and the rounding exponents for the upper 
bound, known from the case of the simple perceptron. 
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Using the previous conclusion for the upper hmit of P{a,N), identifying 2avc 
with some "upper" critical storage capacity, and using the inequality between aye 
and ac, we derive the following relations: 

\a{N) - ad > HN) - 2avc\ - ^ (11) 

with A = 1 and 

\a*{N) - a,\ > \a*{N) - 2avc\ ~ ^ (12) 

with = 

Taking into account the last expressions and the fact that the step-like behavior 
and the main characteristics of the upper bound persist also in the general case 
by increasing the size of the system, we conclude that the shift and the rounding 
exponents for the upper bound, eq.(|TU|), are also upper bounds for the shift and 
rounding exponents in the general case [pi]) . 

In conclusion, in the case of a single-layer perceptron, using the analogy with the 
usual FSS theory, we derived the shift and the rounding critical exponents A and 9, 
respectively. Similar analysis in the general case gives results only for the upper limit 
of the fraction P{a,N). The full understanding of the problem requires additional 
numerical simulations and an investigation for every concrete case of architecture and 
machine. 
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